feat(prompts): add capture_errors option for error tracking#427
feat(prompts): add capture_errors option for error tracking#427andrewm4894 wants to merge 3 commits intomainfrom
Conversation
…error tracking When capture_errors=True and a PostHog client is provided, prompt fetch failures are reported via client.capture_exception(). This surfaces prompt API errors in PostHog error tracking without affecting the existing fallback behavior (stale cache → fallback → raise).
posthog-python Compliance ReportDate: 2026-02-06 18:12:57 UTC
|
| Test | Status | Duration |
|---|---|---|
| Format Validation.Event Has Required Fields | ✅ | 514ms |
| Format Validation.Event Has Uuid | ✅ | 1507ms |
| Format Validation.Event Has Lib Properties | ✅ | 1506ms |
| Format Validation.Distinct Id Is String | ✅ | 1505ms |
| Format Validation.Token Is Present | ✅ | 1506ms |
| Format Validation.Custom Properties Preserved | ✅ | 1506ms |
| Format Validation.Event Has Timestamp | ✅ | 1506ms |
| Retry Behavior.Retries On 503 | ✅ | 7799ms |
| Retry Behavior.Does Not Retry On 400 | ✅ | 3504ms |
| Retry Behavior.Does Not Retry On 401 | ✅ | 3507ms |
| Retry Behavior.Respects Retry After Header | ❌ | 7202ms |
| Retry Behavior.Implements Backoff | ❌ | 20574ms |
| Retry Behavior.Retries On 500 | ✅ | 7069ms |
| Retry Behavior.Retries On 502 | ✅ | 6630ms |
| Retry Behavior.Retries On 504 | ✅ | 7315ms |
| Retry Behavior.Max Retries Respected | ✅ | 20018ms |
| Deduplication.Generates Unique Uuids | ✅ | 1007ms |
| Deduplication.Preserves Uuid On Retry | ✅ | 7362ms |
| Deduplication.Preserves Uuid And Timestamp On Retry | ✅ | 12656ms |
| Deduplication.Preserves Uuid And Timestamp On Batch Retry | ✅ | 6916ms |
| Deduplication.No Duplicate Events In Batch | ✅ | 1501ms |
| Deduplication.Different Events Have Different Uuids | ✅ | 1505ms |
| Compression.Sends Gzip When Enabled | ✅ | 1506ms |
| Batch Format.Uses Proper Batch Structure | ✅ | 1506ms |
| Batch Format.Flush With No Events Sends Nothing | ✅ | 1004ms |
| Batch Format.Multiple Events Batched Together | ✅ | 1504ms |
| Error Handling.Does Not Retry On 403 | ✅ | 3508ms |
| Error Handling.Does Not Retry On 413 | ✅ | 3505ms |
| Error Handling.Retries On 408 | ❌ | 6511ms |
Failures
retry_behavior.respects_retry_after_header
Retry delay too short: 693ms < 2500ms
retry_behavior.implements_backoff
First retry delay too short: 90ms < 100ms
error_handling.retries_on_408
Expected at least 2 requests, got 1
Radu-Raicea
left a comment
There was a problem hiding this comment.
Any way to support both instantiation methods?
| self._client = posthog if posthog is not None else None | ||
| self._capture_errors = capture_errors | ||
|
|
There was a problem hiding this comment.
Unchecked capture_exception call
_maybe_capture_error() assumes the provided posthog object has a capture_exception method. With capture_errors=True, passing a PostHog client version (or a stub) that lacks capture_exception will raise AttributeError, which then gets swallowed by the broad except Exception, meaning errors silently won’t be reported (and it’s hard to diagnose). Consider guarding the call (e.g., hasattr(self._client, "capture_exception")) and/or logging at least once at WARNING when the method is missing so users know why capture isn’t happening.
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/ai/prompts.py
Line: 97:99
Comment:
**Unchecked capture_exception call**
`_maybe_capture_error()` assumes the provided `posthog` object has a `capture_exception` method. With `capture_errors=True`, passing a PostHog client version (or a stub) that lacks `capture_exception` will raise `AttributeError`, which then gets swallowed by the broad `except Exception`, meaning errors silently won’t be reported (and it’s hard to diagnose). Consider guarding the call (e.g., `hasattr(self._client, "capture_exception")`) and/or logging at least once at WARNING when the method is missing so users know why capture isn’t happening.
How can I resolve this? If you propose a fix, please make it concise.| class TestPromptsCaptureErrors(TestPrompts): | ||
| """Tests for the capture_errors option.""" | ||
|
|
||
| @patch("posthog.ai.prompts._get_session") | ||
| def test_capture_exception_called_on_fetch_failure_with_fallback( |
There was a problem hiding this comment.
Tests are not parameterized
The new TestPromptsCaptureErrors cases are all the same shape (setup fetch failure/success → call get() → assert on capture_exception). This increases repetition and makes it easier to miss scenarios when adding new branches. The repo preference is parameterized tests; these can be collapsed into a single parameterized test that covers (fallback vs stale cache vs re-raise vs success) and toggles (capture_errors on/off, client present/absent).
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/test/ai/test_prompts.py
Line: 512:516
Comment:
**Tests are not parameterized**
The new `TestPromptsCaptureErrors` cases are all the same shape (setup fetch failure/success → call `get()` → assert on `capture_exception`). This increases repetition and makes it easier to miss scenarios when adding new branches. The repo preference is parameterized tests; these can be collapsed into a single parameterized test that covers (fallback vs stale cache vs re-raise vs success) and toggles (`capture_errors` on/off, client present/absent).
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.|
We've updated our release process. We require |
|
@andrewm4894 is this still relevant? |
Yeah I think it kinda is worth supporting. Was going to sense check the sampo stuff on it and update to get it ready at some point |
|
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
| @@ -1 +1,4 @@ | |||
| VERSION = "7.11.2" | |||
| VERSION = "7.8.4" | |||
There was a problem hiding this comment.
| capture_errors: If True and a PostHog client is provided, prompt fetch | ||
| failures are reported to PostHog error tracking via capture_exception(). |
There was a problem hiding this comment.
so we are reporting our own errors?
what's the final user gonna do with this when those errors are in their error tracking tab?
There was a problem hiding this comment.
Good question — I think the framing is actually from the user's perspective, not ours. The errors surfaced here aren't "PostHog internals" — they're 404s (prompt typo / deleted), 403s (key perms), auth failures, and network errors. All actionable by the app developer. If my app's prompts were silently falling back for two weeks and PostHog didn't surface it, I'd be pretty upset. It's opt-in (capture_errors=False by default), runs before the fallback chain so visibility doesn't cost reliability, and swallows capture-side failures so it can't break the caller. WDYT?
There was a problem hiding this comment.
mmm I see, then yeah for a few errors that do make sense eg 404, 403 etc but not everything
eg network errors, timeouts, etc, this would be just noisy, affecting the noise x ratio error tracking quality cus those errors specifically arent actionable
can we at least filter only to capture the errors that are actually actionable? (this will depend on the API contract - http errors)
There was a problem hiding this comment.
Hmm if PostHog is down and my latest prompts aren't getting to my app you are saying to not error track that in my PostHog as a user? Do errors have to be actionable? Is that some sort of thing that users are aware or when using error tracking or something?
I'm not super up on error tracking fully tbh but feel a bit uncomfortable making such decisions here on behalf of future users (unless this kinda just is the norm or way error tracking works on posthog? Eg "actionable error tracking" where we do sort of try curate on your behalf, that's a bit different then).
Opt in too here as a gate helps maybe.
The noise problem then is something you manage in error tracking itself via the groupings and rules and AI no?
Or maybe there is some way to make the different error types configurable and use like a sane default more like what you say perhaps.
Although if my prod app was not getting latest prompts and error tracking was not surfacing that and it turned out PostHog prompts was just down and we missed it for a bit say, and then I found this thread where we decided to just not surface it. I'd be very mad.
Or am I being a bit stubborn here? Like I think of an error as an error, but of course it's always more nuanced where you draw the line so maybe that's the point here perhaps of where that is.
There was a problem hiding this comment.
I don't disagree with most of your comments, but if posthog is down, or your internet is down, there's not much you can do besides having a fallback on the client
also, capturing errors most likely wont go thru anyway since most likely that endpoint is also down, or you are shooting the server with more requests and causing even more problems
Error tracking should focus on actionable errors I can address on my end.
I wouldn't want to pay for error events that the only solution is "tell PostHog to be more reliable or tell me to have a more stable internet", but I am very opinionated about this, I know
The way I see this is more about "SDK diagnostics", we failed to request the prompts, and that's an SDK issue and not a customer issue.
I'm not familiar with this product, so I'll approve it and leave it with you. I just wanted to share my 2 cents
There was a problem hiding this comment.
Yeah I know what you mean, I'm kinda thinking of the potential code in here to cherry pick errors as an implicit suppression rule in PostHog error tracking.
Going to research a bit more on how it all works and what other parts of the SDK already do for stuff like this just to make sure I'm at least in line with them and things like that.
Eg does PostHog error tracking itself then maybe need some sort of new template for these new potential prompt errors etc.
Will check all that before merging to be sure.
|
@andrewm4894 you have to address this comment and fix the merge conflicts before merging |
|
Superseded by #520 — rebased onto current main (couldn't force-push this branch due to signed-commits ruleset flagging historical pre-sampo release commits). Addressed review feedback there: dropped the version bump (per @marandaneto), added sampo changeset, added hasattr guard on capture_exception (per Greptile). Review continues on #520. |
This is just an exploratory one. As a user i kinda might want posthog errors if we fail to handle prompt as expected. I dont think anything too crazy going on here but worth discussing.
Summary
capture_errorsparameter toPrompts.__init__()(defaultFalse, fully backward compatible)Trueand a PostHog client is provided, prompt fetch failures are reported viaclient.capture_exception()before the existing fallback logic runscapture_exception()itself fails, it is silently swallowed and never affects prompt resolutionMotivation
When using prompts with fallbacks, fetch failures are silently handled -- the caller gets a fallback string and the error is only logged at WARNING level. With
capture_errors=True, these failures surface in PostHog error tracking, making it easy to spot API issues, auth problems, or missing prompts without configuring Python logging.Usage
Changes
posthog/ai/prompts.py-- newcapture_errorsparam,_maybe_capture_error()helperposthog/test/ai/test_prompts.py-- 7 new tests covering all branchesTest plan